53 research outputs found
Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain
Human action recognition has become one of the most active field of research
in computer vision due to its wide range of applications, like surveillance,
medical, industrial environments, smart homes, among others. Recently, deep
learning has been successfully used to learn powerful and interpretable
features for recognizing human actions in videos. Most of the existing deep
learning approaches have been designed for processing video information as RGB
image sequences. For this reason, a preliminary decoding process is required,
since video data are often stored in a compressed format. However, a high
computational load and memory usage is demanded for decoding a video. To
overcome this problem, we propose a deep neural network capable of learning
straight from compressed video. Our approach was evaluated on two public
benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating comparable
recognition performance to the state-of-the-art methods, with the advantage of
running up to 2 times faster in terms of inference speed
Edited nearest neighbour for selecting keyframe summaries of egocentric videos
A keyframe summary of a video must be concise, comprehensive and diverse. Current video summarisation methods may not be able to enforce diversity of the summary if the events have highly similar visual content, as is the case of egocentric videos. We cast the problem of selecting a keyframe summary as a problem of prototype (instance) selection for the nearest neighbour classifier (1-nn). Assuming that the video is already segmented into events of interest (classes), and represented as a dataset in some feature space, we propose a Greedy Tabu Selector algorithm (GTS) which picks one frame to represent each class. An experiment with the UT (Egocentric) video database and seven feature representations illustrates the proposed keyframe summarisation method. GTS leads to improved match to the user ground truth compared to the closest-to-centroid baseline summarisation method. Best results were obtained with feature spaces obtained from a convolutional neural network (CNN).Leverhulme Trust, UKSao Paulo Research Foundation - FAPESPBangor Univ, Sch Comp Sci, Dean St, Bangor LL57 1UT, Gwynedd, WalesFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilFed Univ Sao Paulo UNIFESP, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, BrazilLeverhulme: RPG-2015-188FAPESP: 2016/06441-7Web of Scienc
How Far Can We Get with Neural Networks Straight from JPEG?
Convolutional neural networks (CNNs) have achieved astonishing advances over
the past decade, defining state-of-the-art in several computer vision tasks.
CNNs are capable of learning robust representations of the data directly from
the RGB pixels. However, most image data are usually available in compressed
format, from which the JPEG is the most widely used due to transmission and
storage purposes demanding a preliminary decoding process that have a high
computational load and memory usage. For this reason, deep learning methods
capable of leaning directly from the compressed domain have been gaining
attention in recent years. These methods adapt typical CNNs to work on the
compressed domain, but the common architectural modifications lead to an
increase in computational complexity and the number of parameters. In this
paper, we investigate the usage of CNNs that are designed to work directly with
the DCT coefficients available in JPEG compressed images, proposing a
handcrafted and data-driven techniques for reducing the computational
complexity and the number of parameters for these models in order to keep their
computational cost similar to their RGB baselines. We make initial ablation
studies on a subset of ImageNet in order to analyse the impact of different
frequency ranges, image resolution, JPEG quality and classification task
difficulty on the performance of the models. Then, we evaluate the models on
the complete ImageNet dataset. Our results indicate that DCT models are capable
of obtaining good performance, and that it is possible to reduce the
computational complexity and the number of parameters from these models while
retaining a similar classification accuracy through the use of our proposed
techniques.Comment: arXiv admin note: substantial text overlap with arXiv:2012.1372
Tightening Classification Boundaries in Open Set Domain Adaptation through Unknown Exploitation
Convolutional Neural Networks (CNNs) have brought revolutionary advances to
many research areas due to their capacity of learning from raw data. However,
when those methods are applied to non-controllable environments, many different
factors can degrade the model's expected performance, such as unlabeled
datasets with different levels of domain shift and category shift.
Particularly, when both issues occur at the same time, we tackle this
challenging setup as Open Set Domain Adaptation (OSDA) problem. In general,
existing OSDA approaches focus their efforts only on aligning known classes or,
if they already extract possible negative instances, use them as a new category
learned with supervision during the course of training. We propose a novel way
to improve OSDA approaches by extracting a high-confidence set of unknown
instances and using it as a hard constraint to tighten the classification
boundaries of OSDA methods. Especially, we adopt a new loss constraint
evaluated in three different means, (1) directly with the pristine negative
instances; (2) with randomly transformed negatives using data augmentation
techniques; and (3) with synthetically generated negatives containing
adversarial features. We assessed all approaches in an extensive set of
experiments based on OVANet, where we could observe consistent improvements for
two public benchmarks, the Office-31 and Office-Home datasets, yielding
absolute gains of up to 1.3% for both Accuracy and H-Score on Office-31 and
5.8% for Accuracy and 4.7% for H-Score on Office-Home
Budget-Aware Pruning: Handling Multiple Domains with Less Parameters
Deep learning has achieved state-of-the-art performance on several computer
vision tasks and domains. Nevertheless, it still has a high computational cost
and demands a significant amount of parameters. Such requirements hinder the
use in resource-limited environments and demand both software and hardware
optimization. Another limitation is that deep models are usually specialized
into a single domain or task, requiring them to learn and store new parameters
for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by
learning a single model that is capable of performing well in multiple domains.
Nevertheless, the models are usually larger than the baseline for a single
domain. This work tackles both of these problems: our objective is to prune
models capable of handling multiple domains according to a user-defined budget,
making them more computationally affordable while keeping a similar
classification performance. We achieve this by encouraging all domains to use a
similar subset of filters from the baseline model, up to the amount defined by
the user's budget. Then, filters that are not used by any domain are pruned
from the network. The proposed approach innovates by better adapting to
resource-limited devices while, to our knowledge, being the only work that
handles multiple domains at test time with fewer parameters and lower
computational complexity than the baseline model for a single domain.Comment: arXiv admin note: substantial text overlap with arXiv:2210.0810
Budget-Aware Pruning for Multi-Domain Learning
Deep learning has achieved state-of-the-art performance on several computer
vision tasks and domains. Nevertheless, it still has a high computational cost
and demands a significant amount of parameters. Such requirements hinder the
use in resource-limited environments and demand both software and hardware
optimization. Another limitation is that deep models are usually specialized
into a single domain or task, requiring them to learn and store new parameters
for each new one. Multi-Domain Learning (MDL) attempts to solve this problem by
learning a single model that is capable of performing well in multiple domains.
Nevertheless, the models are usually larger than the baseline for a single
domain. This work tackles both of these problems: our objective is to prune
models capable of handling multiple domains according to a user defined budget,
making them more computationally affordable while keeping a similar
classification performance. We achieve this by encouraging all domains to use a
similar subset of filters from the baseline model, up to the amount defined by
the user's budget. Then, filters that are not used by any domain are pruned
from the network. The proposed approach innovates by better adapting to
resource-limited devices while, to our knowledge, being the only work that is
capable of handling multiple domains at test time with fewer parameters and
lower computational complexity than the baseline model for a single domain
Climate change, in the framework of the constructal law
Here we present a simple and transparent alternative to the complex models of Earth thermal behavior under time-changing conditions. We show the one-to-one relationship between changes in atmospheric properties and time-dependent changes in temperature and its distribution on Earth. The model accounts for convection and radiation, thermal inertia and changes in albedo (&rho;) and greenhouse factor (γ). The constructal law is used as the principle that governs the evolution of flow configuration in time, and provides closure for the equations that describe the model. In the first part of the paper, the predictions are tested against the current thermal state of Earth. Next, the model showed that for two time-dependent scenarios, (&delta;&rho; = 0.002; &delta;&gamma; = 0.011) and (&delta;&rho; = 0.002; &delta;&gamma; = 0.005) the predicted equatorial and polar temperature increases and the time scales are (&Delta;<i>T</i><sub>H</sub> = 1.16 K; &Delta;<i>T</i><sub>L</sub> = 1.11 K; 104 years) and (0.41 K; 0.41 K; 57 years), respectively. In the second part, a continuous model of temperature variation was used to predict the thermal response of the Earth's surface for changes bounded by &delta;&rho; = &delta;&gamma; and &delta;&rho; = &minus;&delta;&gamma;. The results show that the global warming amplitudes and time scales are consistent with those obtained for &delta;&rho; = 0.002 and &delta;&gamma; = 0.005. The poleward heat current reaches its maximum in the vicinity of 35° latitude, accounting for the position of the Ferrel cell between the Hadley and Polar Cells
Multimedia geocoding: the RECOD 2014 approach
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much evidences as possible (textual, visual, and/or audio descriptors) to geocode a given image/video. We estimate the location of test items by clustering the geographic coordinates of top-ranked items in one or more ranked lists defined in terms of different criteria.This work describes the approach proposed by the RECOD team for the Placing Task of MediaEval 2014. This task requires the definition of automatic schemes to assign geographical locations to images and videos. Our approach is based on the use of as much e1263FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)2013/08645-0 ; 2013/11359-0306580/2012-8 ; 484254/2012-0sem informaçãoMediaEval 2014 Worksho
- …